ROUGE: A Package For Automatic Evaluation Of Summaries

نویسنده

  • Chin-Yew Lin
چکیده

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans. This paper introduces four different ROUGE measures: ROUGE-N, ROUGE-L, ROUGE-W, and ROUGE-S included in the ROUGE summarization evaluation package and their evaluations. Three of them have been used in the Document Understanding Conference (DUC) 2004, a large-scale summarization evaluation sponsored by NIST.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Selection for Summarising: The Sunderland DUC 2004 Experience

In this paper we describe our participation in task 1-very short single-document summaries in DUC 2004. The task chosen is related to our research project, which aims to produce abstracting summaries to improve search engine result summaries. DUC allowed us to produce summaries no longer than 75 characters, therefore we focused on feature selection to produce a set of key words as summaries ins...

متن کامل

Older versions of the ROUGEeval summarization evaluation system were easier to fool

We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of ...

متن کامل

Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough?

ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation. It includes measures to automatically determine the quality of a summary by comparing it to other (ideal) summaries created by humans. The measures count the number of overlapping units such as n-gram, word sequences, and word pairs between the computer-generated summary to be evaluated and the ideal summaries created by humans...

متن کامل

ROUGE 2.0: Updated and Improved Measures for Evaluation of Summarization Tasks

Evaluation of summarization tasks is extremely crucial to determining the quality of machine generated summaries. Over the last decade, ROUGE has become the standard automatic evaluation measure for evaluating summarization tasks. While ROUGE has been shown to be effective in capturing n-gram overlap between system and human composed summaries, there are several limitations with the existing RO...

متن کامل

Discrepancy Between Automatic and Manual Evaluation of Summaries

Today, automatic evaluation metrics such as ROUGE have become the de-facto mode of evaluating an automatic summarization system. However, based on the DUC and the TAC evaluation results, (Conroy and Schlesinger, 2008; Dang and Owczarzak, 2008) showed that the performance gap between humangenerated summaries and system-generated summaries is clearly visible in manual evaluations but is often not...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004